Goto

Collaborating Authors

 Plovdiv Province


Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

Vossebeld, Floris, Wang, Shenghui

arXiv.org Artificial Intelligence

Generating complex, logically-sound SPARQL queries for multi-hop questions remains a critical bottleneck for Knowledge Graph Question Answering, as the brittle nature of one-shot generation by Large Language Models (LLMs) hinders reliable interaction with structured data. Current methods lack the adaptive policies needed to dynamically debug queries based on real-time execution feedback. This paper introduces a novel agentic framework where an LLM learns a resilient policy for the sequential process of iterative SPARQL construction. We show that a compact 3B-parameter model, trained exclusively via outcome-driven Reinforcement Learning (GRPO) without supervised fine-tuning, can learn effective policies for this task, discovering how to systematically recover from execution errors and refine its queries toward a correct answer. On a curated, executable single-answer subset of LC-QuAD 2.0, our agent achieves 49.7\% accuracy post-entity-linking, a significant 17.5 percentage point improvement over the strongest iterative zero-shot baseline. Further analysis reveals that while the agent's capability is driven by RL, its performance is enhanced by an explicit deliberative reasoning step that acts as a cognitive scaffold to improve policy precision. This work presents a generalizable blueprint for teaching agents to master formal, symbolic tools through interaction, bridging the gap between probabilistic LLMs and the structured world of Knowledge Graphs.


Grounding Multilingual Multimodal LLMs With Cultural Knowledge

Nyandwi, Jean de Dieu, Song, Yueqi, Khanuja, Simran, Neubig, Graham

arXiv.org Artificial Intelligence

Multimodal Large Language Models excel in high-resource settings, but often misinterpret long-tail cultural entities and underperform in low-resource languages. To address this gap, we propose a data-centric approach that directly grounds MLLMs in cultural knowledge. Leveraging a large scale knowledge graph from Wikidata, we collect images that represent culturally significant entities, and generate synthetic multilingual visual question answering data. The resulting dataset, CulturalGround, comprises 22 million high-quality, culturally-rich VQA pairs spanning 42 countries and 39 languages. We train an open-source MLLM CulturalPangea on CulturalGround, interleaving standard multilingual instruction-tuning data to preserve general abilities. CulturalPangea achieves state-of-the-art performance among open models on various culture-focused multilingual multimodal benchmarks, outperforming prior models by an average of 5.0 without degrading results on mainstream vision-language tasks. Our findings show that our targeted, culturally grounded approach could substantially narrow the cultural gap in MLLMs and offer a practical path towards globally inclusive multimodal systems.


Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices

Cruz, Luís, Fernandes, João Paulo, Kirkeby, Maja H., Martínez-Fernández, Silverio, Sallou, June, Anwar, Hina, Roque, Enrique Barba, Bogner, Justus, Castaño, Joel, Castor, Fernando, Chasmawala, Aadil, Cunha, Simão, Feitosa, Daniel, González, Alexandra, Jedlitschka, Andreas, Lago, Patricia, Muccini, Henry, Oprescu, Ana, Rani, Pooja, Saraiva, João, Sarro, Federica, Selvan, Raghavendra, Vaidhyanathan, Karthik, Verdecchia, Roberto, Yamshchikov, Ivan P.

arXiv.org Artificial Intelligence

The environmental impact of Artificial Intelligence (AI)-enabled systems is increasing rapidly, and software engineering plays a critical role in developing sustainable solutions. The "Greening AI with Software Engineering" CECAM-Lorentz workshop (no. 1358, 2025) funded by the Centre Européen de Calcul Atomique et Moléculaire and the Lorentz Center, provided an interdisciplinary forum for 29 participants, from practitioners to academics, to share knowledge, ideas, practices, and current results dedicated to advancing green software and AI research. The workshop was held February 3-7, 2025, in Lausanne, Switzerland. Through keynotes, flash talks, and collaborative discussions, participants identified and prioritized key challenges for the field. These included energy assessment and standardization, benchmarking practices, sustainability-aware architectures, runtime adaptation, empirical methodologies, and education. This report presents a research agenda emerging from the workshop, outlining open research directions and practical recommendations to guide the development of environmentally sustainable AI-enabled systems rooted in software engineering principles.


EDCA -- An Evolutionary Data-Centric AutoML Framework for Efficient Pipelines

Simões, Joana, Correia, João

arXiv.org Artificial Intelligence

Automated Machine Learning (AutoML) gained popularity due to the increased demand for Machine Learning (ML) specialists, allowing them to apply ML techniques effortlessly and quickly. AutoML implementations use optimisation methods to identify the most effective ML solution for a given dataset, aiming to improve one or more predefined metrics. However, most implementations focus on model selection and hyperparameter tuning. Despite being an important factor in obtaining high-performance ML systems, data quality is usually an overlooked part of AutoML and continues to be a manual and time-consuming task. This work presents EDCA, an Evolutionary Data Centric AutoML framework. In addition to the traditional tasks such as selecting the best models and hyperparameters, EDCA enhances the given data by optimising data processing tasks such as data reduction and cleaning according to the problems' needs. All these steps create an ML pipeline that is optimised by an evolutionary algorithm. To assess its effectiveness, EDCA was compared to FLAML and TPOT, two frameworks at the top of the AutoML benchmarks. The frameworks were evaluated in the same conditions using datasets from AMLB classification benchmarks. EDCA achieved statistically similar results in performance to FLAML and TPOT but used significantly less data to train the final solutions. Moreover, EDCA experimental results reveal that a good performance can be achieved using less data and efficient ML algorithm aspects that align with Green AutoML guidelines


HT-HEDL: High-Throughput Hypothesis Evaluation in Description Logic

Algahtani, Eyad

arXiv.org Artificial Intelligence

We present High-Throughput Hypothesis Evaluation in Description Logic (HT-HEDL). HT-HEDL is a high-performance hypothesis evaluation engine that accelerates hypothesis evaluation computations for inductive logic programming (ILP) learners using description logic (DL) for their knowledge representation; in particular, HT-HEDL targets accelerating computations for the $\mathcal{ALCQI}^{\mathcal{(D)}}$ DL language. HT-HEDL aggregates the computing power of multi-core CPUs with multi-GPUs to improve hypothesis computations at two levels: 1) the evaluation of a single hypothesis and 2) the evaluation of multiple hypotheses (i.e., batch of hypotheses). In the first level, HT-HEDL uses a single GPU or a vectorized multi-threaded CPU to evaluate a single hypothesis. In vectorized multi-threaded CPU evaluation, classical (scalar) CPU multi-threading is combined with CPU's extended vector instructions set to extract more CPU-based performance. The experimental results revealed that HT-HEDL increased performance using CPU-based evaluation (on a single hypothesis): from 20.4 folds using classical multi-threading to $\sim85$ folds using vectorized multi-threading. In the GPU-based evaluation, HT-HEDL achieved speedups of up to $\sim38$ folds for single hypothesis evaluation using a single GPU. To accelerate the evaluation of multiple hypotheses, HT-HEDL combines, in parallel, GPUs with multi-core CPUs to increase evaluation throughput (number of evaluated hypotheses per second). The experimental results revealed that HT-HEDL increased evaluation throughput by up to 29.3 folds using two GPUs and up to $\sim44$ folds using two GPUs combined with a CPU's vectorized multi-threaded evaluation.


SPILDL: A Scalable and Parallel Inductive Learner in Description Logic

Algahtani, Eyad

arXiv.org Artificial Intelligence

We present SPILDL, a Scalable and Parallel Inductive Learner in Description Logic (DL). SPILDL is based on the DL-Learner (the state of the art in DL-based ILP learning). As a DL-based ILP learner, SPILDL targets the $\mathcal{ALCQI}^{\mathcal{(D)}}$ DL language, and can learn DL hypotheses expressed as disjunctions of conjunctions (using the $\sqcup$ operator). Moreover, SPILDL's hypothesis language also incorporates the use of string concrete roles (also known as string data properties in the Web Ontology Language, OWL); As a result, this incorporation of powerful DL constructs, enables SPILDL to learn powerful DL-based hypotheses for describing many real-world complex concepts. SPILDL employs a hybrid parallel approach which combines both shared-memory and distributed-memory approaches, to accelerates ILP learning (for both hypothesis search and evaluation). According to experimental results, SPILDL's parallel search improved performance by up to $\sim$27.3 folds (best case). For hypothesis evaluation, SPILDL improved evaluation performance through HT-HEDL (our multi-core CPU + multi-GPU hypothesis evaluation engine), by up to 38 folds (best case). By combining both parallel search and evaluation, SPILDL improved performance by up to $\sim$560 folds (best case). In terms of worst case scenario, SPILDL's parallel search doesn't provide consistent speedups on all datasets, and is highly dependent on the search space nature of the ILP dataset. For some datasets, increasing the number of parallel search threads result in reduced performance, similar or worse than baseline. Some ILP datasets benefit from parallel search, while others don't (or the performance gains are negligible). In terms of parallel evaluation, on small datasets, parallel evaluation provide similar or worse performance than baseline.


ChronoFact: Timeline-based Temporal Fact Verification

Barik, Anab Maulana, Hsu, Wynne, Lee, Mong Li

arXiv.org Artificial Intelligence

Automated fact verification plays an essential role in fostering trust in the digital space. Despite the growing interest, the verification of temporal facts has not received much attention in the community. Temporal fact verification brings new challenges where cues of the temporal information need to be extracted and temporal reasoning involving various temporal aspects of the text must be applied. In this work, we propose an end-to-end solution for temporal fact verification that considers the temporal information in claims to obtain relevant evidence sentences and harness the power of large language model for temporal reasoning. Recognizing that temporal facts often involve events, we model these events in the claim and evidence sentences. We curate two temporal fact datasets to learn time-sensitive representations that encapsulate not only the semantic relationships among the events, but also their chronological proximity. This allows us to retrieve the top-k relevant evidence sentences and provide the context for a large language model to perform temporal reasoning and outputs whether a claim is supported or refuted by the retrieved evidence sentences. Experiment results demonstrate that the proposed approach significantly enhances the accuracy of temporal claim verification, thereby advancing current state-of-the-art in automated fact verification.


An interdisciplinary exploration of trade-offs between energy, privacy and accuracy aspects of data

de Reus, Pepijn, Dresen, Kyra, Oprescu, Ana, Irion, Kristina, Kolk, Ans

arXiv.org Artificial Intelligence

The digital era has raised many societal challenges, including ICT's rising energy consumption and protecting privacy of personal data processing. This paper considers both aspects in relation to machine learning accuracy in an interdisciplinary exploration. We first present a method to measure the effects of privacy-enhancing techniques on data utility and energy consumption. The environmental-privacy-accuracy trade-offs are discovered through an experimental set-up. We subsequently take a storytelling approach to translate these technical findings to experts in non-ICT fields. We draft two examples for a governmental and auditing setting to contextualise our results. Ultimately, users face the task of optimising their data processing operations in a trade-off between energy, privacy, and accuracy considerations where the impact of their decisions is context-sensitive.


Enhancing Personalized Recipe Recommendation Through Multi-Class Classification

Neelam, Harish, Veerella, Koushik Sai

arXiv.org Artificial Intelligence

This paper intends to address the challenge of personalized recipe recommendation in the realm of diverse culinary preferences. The problem domain involves recipe recommendations, utilizing techniques such as association analysis and classification. Association analysis explores the relationships and connections between different ingredients to enhance the user experience. Meanwhile, the classification aspect involves categorizing recipes based on user-defined ingredients and preferences. A unique aspect of the paper is the consideration of recipes and ingredients belonging to multiple classes, recognizing the complexity of culinary combinations. This necessitates a sophisticated approach to classification and recommendation, ensuring the system accommodates the nature of recipe categorization. The paper seeks not only to recommend recipes but also to explore the process involved in achieving accurate and personalized recommendations.


Post-OCR Text Correction for Bulgarian Historical Documents

Beshirov, Angel, Dobreva, Milena, Dimitrov, Dimitar, Hardalov, Momchil, Koychev, Ivan, Nakov, Preslav

arXiv.org Artificial Intelligence

The digitization of historical documents is crucial for preserving the cultural heritage of the society. An important step in this process is converting scanned images to text using Optical Character Recognition (OCR), which can enable further search, information extraction, etc. Unfortunately, this is a hard problem as standard OCR tools are not tailored to deal with historical orthography as well as with challenging layouts. Thus, it is standard to apply an additional text correction step on the OCR output when dealing with such documents. In this work, we focus on Bulgarian, and we create the first benchmark dataset for evaluating the OCR text correction for historical Bulgarian documents written in the first standardized Bulgarian orthography: the Drinov orthography from the 19th century. We further develop a method for automatically generating synthetic data in this orthography, as well as in the subsequent Ivanchev orthography, by leveraging vast amounts of contemporary literature Bulgarian texts. We then use state-of-the-art LLMs and encoder-decoder framework which we augment with diagonal attention loss and copy and coverage mechanisms to improve the post-OCR text correction. The proposed method reduces the errors introduced during recognition and improves the quality of the documents by 25\%, which is an increase of 16\% compared to the state-of-the-art on the ICDAR 2019 Bulgarian dataset. We release our data and code at \url{https://github.com/angelbeshirov/post-ocr-text-correction}.}